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PREFACE 


This  volume  is  not  a  work  on  statistics.  Its  sole  aim 
is  to  enable  the  many,  who  have  perforce  to  deal  with 
statistical  data  without  having  any  aptitude  for  statistical 
analysis,  to  understand  their  results  more  fully.  Those  who 
desire  a  detailed  knowledge  of  the  theory  of  statistics  are 
referred  to  any  standard  work  on  the  subject,  such  as  G.  Udny 
Yule’s  Introduction  to  the  Theory  of  Statistics. 

The  manuscript  has  been  read  through  by  three  friends : 
the  first  a  statistician,  the  second  a  physicist,  the  third  a 
biologist.  To  these  three,  Mr.  G.  Udny  Yule,  C.B.E.,  M.A., 
F.R.S.,  Mr.  A.  Ferguson,  M.A.,  D.Sc.,  and  Mr.  M.  Thomas, 
M.A.,  I  am  very  greatly  indebted  for  various  suggestions 
and  help.  At  the  same  time,  mine  is  the  responsibility 
should  any  errors  be  found. 
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STATISTICAL  METHOD 


CHAPTER  I 

Tabulation  of  data  —  Frequency  distributions  — 
Goodness  of  Fit 

1.  Though  the  order  of  the  collection  of  data  is  often  of 
extreme  importance,  it  is  not  usually  one  which  lends  itself 
easily  to  statistical  manipulation.  The  data  must  usually 
be  rearranged. 

2.  When  numbers  are  large  the  data  should  be  grouped 
together  into  “classes”,  which,  for  convenience,  should  be 
equal.  The  number  of  classes  should,  as  a  rule,  be  not  more 
than  30.  These  classes  must  be  mutually  exclusive,  so  that 
no  individual  observation  can  be  included  in  more  than  one 
class;  and  their  limits  should  be  fully  stated.  For  example, 
if  the  classes  are  groups  of  10,  class  intervals  such  aso — 10; 
10 — 20;  20 — 30;  would  be  incorrect,  for  it  is  not  clear 
what  classes  include  the  numbers  10,  20,  30.  If  the  indivi¬ 
dual  observations  are  whole  numbers,  e.  g.  the  number  of 
runs  scored  by  an  individual  in  a  cricket  match,  the  classes 
might  be  0 — 9;  10 — 19;  20 — 29;  but  if  the  individual 
observations  form  a  more  or  less  continuous  series,  e.  g. 
the  ages  of  school  boys,  the  classes  might  conveniently  be 
(o  to  less  than  10) ;  (10  to  less  than  20) ;  (20  to  less  than  30). 
There  can  then  be  no  ambiguity.  The  classes  may  then  be 
indicated  either  by  (0 — );  (10 — );  (20 — );  or  by  the  mid 
values  of  the  classes,  viz.  5;  15;  25;  &c. 

3.  Table  I  is  a  frequency  table  which  shows  the  classi¬ 
fication  of  cuckoo’s  eggs  according  to  their  lengths.  Table  II 
similarly  shows  the  number  of  secondary  schools  in  England 
and  Wales  in  1906  classified  according  to  the  numbers  of 
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children  attending  them.  It  has  been  prepared  from  the 
actual  number  of  children  attending  each  school. 

Table  I.  Length  of  cuckoo  s  egg*. 

Length  Class  mm.  19-0  19-5  20-0  20-5  210  21-5  22-0  22-5  23-0 
Frequency  1  3  33  39  156  152  392  288  286 

Length  Class  mm.  23-5  24-0  24-5  25-0  25-5  26-0  26-5  Total 

Frequency  100  86  21  12  2  o  1  I372 

(This  table  shows  that  there  are  1572  eggs,  of  which  39  are  approxi¬ 
mately  20-5  mm.  in  length,  &c.) 

Table  II.  Size  of  Secondary  Schools. 

Size  Class  (No.  of  pupils)  20  60  100  140  180  220  260  300  340 

Frequency  18  147  138  115  78  46  35  26  21 

Size  Class  (No.  of  pupils)  380  420  460  500  540  580  620  660 
Frequency  12  10  9  6  3  1  4  1 

Size  Class  (No.  of  pupils)  700  740  780  820  860  900  940  980  Total 
Frequency  12001  101  676 

(This  table  shows  that  there  are  676  schools,  of  which  35  have  an 
attendance  of  260,  &c.) 


4.  In  Table  I  the  possible  variations  in  length  have  been 
divided  into  16  classes,  viz.  18.75 — less  than  I9'25  mm., 
19.25 — less  than  I9'75  mm.,  &c.,  each  class  being  designated 
by  the  middle  of  the  interval.  In  Table  II  there  are  25  classes, 
1 — 40,  41 — 80,  &c.  designated  20,  60  &c.  Any  school  with 
an  attendance  lying  between  41  and  80  is  included  in  the 
“60”  class'. 

5.  An  examination  of  the  tables  shows  that  they  are  not 
of  the  same  form.  In  Table  I  the  maximum  frequency  is 
near  the  middle,  the  frequencies  tailing  off  fairly  evenly 
on  either  side.  In  Table  II  the  maximum  comes  very 
early,  and  the  tails  are  very  uneven. 

6.  There  are  many  well  known  mathematical  forms  of 
frequency  distributions**.  The  most  common  type  is  the 


*  After  O.  H.  Latter.  Quoted  from  the  B.  A.  Report  on  Bio¬ 
logical  Measurements.  1927,  p.  291. 

**  Whittaker  and  Robinson,  Calculus  of  Observations,  p.  164. 
(In  future  references,  this  work  will  be  designated  as  W.  and  R.) 
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“Binomial”  distribution,  which  gives  the  frequencies  with 
which  any  phenomenon  may  be  expected  to  occur  as  a 
pure  matter  of  chance,  as  for  example  the  number  of  times 
any  given  combination  of  faces  may  be  expected  to  appear 
when  “n”  dice  are  tossed  together  “N”  times.  The  frequen¬ 
cies  are  given  by  the  various  terms  of  the  series 


N  ■  qn ;  N'tt-qn~1'p;  N- 


n  ( n  ■ 


±)„n- 


,  T  n-(n  —  i )  (n  —  2)  0  ,0  0 

n — 1  1-2';3 — V~3^3; 


the  first  term  being  the  number  of  times  there  will  be  no 
successes,  the  second  the  number  of  times  there  will  be  one 
success,  the  third  the  number  of  times  there  will  be  two 
successes,  and  so  on,  where  “N”  is  the  number  of  events, 
“n”  is  the  number  of  individuals,  “ft”  is  the  chance  of  a 
success,  and  “q”  is  the  chance  of  a  failure. 


Example.  How  many  times  will  three  fours  appear  together 
when  five  dice  are  tossed  together  a  hundred  times  ? 

Here  N  (the  number  of  tosses)  is  100, 
n  (the  number  of  dice)  is  5, 

p  (the  chance  of  a  success  with  a  single  die)  is  1/6,  since 
of  the  six  faces  only  one  has  a  four  inscribed  thereon, 
similarly, 

q  (the  chance  of  a  failure  with  a  single  die)  is  5/6. 

Now  the  frequency  with  which  three  successes  may  be  expected 
is  given  by  the  fourth  term  of  the  expression,  i.  e. 


N 


n  (n 


1)  (w  —  2) 

•2-3 


qn- 


ft3  _ 


Substituting  we  get 


100- 


5-4-3 

1-2-3 


=  3  •  2  . 


Exercises. 

1.  How  many  times  may  we  expect  two  6’s  at  a  time  when  tossing 
six  dice  one  hundred  times  ? 

2.  In  what  percentage  of  cases  may  we  expect  to  get  five  hearts  in 
eight  successive  random  draws  of  one  card  from  a  full  pack? 

3.  What  are  the  frequencies  to  be  expected  of  no  heads,  two  heads, 
&c.  when  tossing  ten  pennies  together  a  hundred  times  ? 
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7.  The  distribution  of  frequencies  obtained  when 
p  =  q  —  y2,  as  in  exercise  3  above,  is  symmetrical.  The 
frequencies  increase  gradually  from  one  extreme  to  the 
middle  and  then  tail  off  similarly  as  the  other  extreme  is 
reached.  This  is  the  most  common  type  of  frequency  distrib¬ 
ution,  and  is  termed  the  normal  distribution.  The  heights 
of  adult  men  in  any  given  town,  the  sizes  of  apples  from 
any  given  tree  would  probably  conform  to  this  distrib¬ 
ution. 

8.  There  are  of  course  very  many  other  frequency  distribu¬ 
tions.  For  example  in  certain  cases  of  Mendelian  inheritance 
the  hybrids  in  the  F2  generation  would  have,  as  a  pure 
matter  of  chance,  the  frequencies  9 :  3:3:1;  and  any  other 
frequency  distribution  might,  under  different  circumstances, 
be  expected. 

9.  It  is  often  necessary  to  test  how  closely  an  experimen¬ 
tally  found  frequency  distribution  agrees  with  theory. 

If  (/)  is  the  observed  frequency  and  (fe)  is  the  expected 


frequency  in  any  one  class,  then 


{t^f  tr¬ 
ie 


can  be  used  as  a 


measure  of  the  difference  between  the  two  frequencies,  and 
is  always  positive.  The  sum  of  all  these  quantities  for  all 
the  classes  is  called  7 ;2.  It  may  be  used  to  indicate  the 
closeness  with  which  the  two  distributions  approximate  to 
each  other*.  From  the  appropriate  tables  of  x1**,  or  from 
the  graph  of  Appendix  I,  it  is  possible  to  read  off  immediately 
the  probability  of  the  differences  between  the  observed  and 
expected  frequencies  occurring  as  a  mere  matter  of  chance. 
The  use  of  the  graph  may  be  illustrated  by  the  following 
example : 

Ten  coins  were  tossed  together  50  times,  and  the  fre¬ 
quencies  of  o,  1,  2,  &c.  heads  were  as  shown  in  col.  2  of 
the  table  below.  How  often  would  one  expect  a  divergence 
from  the  theoretical  distribution  as  great  as  or  greater  than 
this  as  a  matter  of  chance? 


*  W.  and  R.,  p.  338. 

**  K.  Pearson,  Tables  for  Statisticians  and  Biometricians,  Table  12, 
and  G.  Udny  Yule,  Journal  Royal  Statistical  Society,  Vol.  LXXXV, 
1922,  pp.  103—4. 
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No.  of 
heads 

Observed 
freq.  (/) 

Expected 
freq.  (/6) 

(/-/.)* 

fe 

O 

O 

•05 

1 

I 

O 

•49 

\  '742/2'74  = 

*20 

2 

2 

2*20 

J 

3 

6 

5-86 

•i42/5'86  = 

•003 

4 

6 

10-25 

4-252/10-25  = 

1-76 

5 

20 

12-30 

7'7c2/i2’30  = 

4-82 

6 

6 

10-25 

4-252/10-25  = 

176 

7 

5 

5-86 

-862/5-86  = 

•!3 

8 

5 

2*20 

1 

9 

O 

•49 

>  2-262/2'74  — 

i-86 

IO 

O 

•05 

J 

Total 

50 

50-00 

%2  = 

10-533 

The  expected  frequencies,  calculated  as  in  para.  6,  are 
entered  in  col.  3.  In  col.  4  the  squares  of  the  differences 
between  columns  2  and  3  divided  by  the  expected  frequencies, 
are  entered.  It  will  be  noted  that  the  data  of  the  first  and 
last  three  frequencies  have  been  grouped  together.  This  has 

always  to  be  done  when  the  frequencies  are  small. 

if  —  /  \  2 

The  sum  of  the  quantities2 — ,  winch  is  10-53,  is  X 2- 

J  e 

10.  Turning  to  Appendix  I  we  see  that  it  gives  the  value 
of  ‘P’,  the  probability  that  the  variation  occurs  as  a  mere 
matter  of  chance,  for  given  values  of  and  n' .  n'  is  one 
more  than  the  number  of  terms  which  must  be  known  for  us 
to  be  able  to  fill  in  the  frequencies.  In  the  example,  through 
grouping  together  the  0,  1,  2  and  8,  9,  10  frequencies,  the 
frequency  table  is  reduced  to  seven  rows.  Since  we  know 
that  the  total  of  the  frequencies  must  be  50,  we  can  fill  in 
the  seven  frequencies  if  we  are  given  six  of  them.  So  n'  in 
this  case  must  be  6  +  1  =  7- 

For  n'  =  7,  and  =  10-53,  the  value  of  P  is  o-i,  which 
means  that  we  should  expect  a  difference  as  great  as  or 
greater  than  this  observed  on  o‘i  occasions,  i.  e.  about  once 
in  every  ten  repetitions  of  the  experiment. 

11.  Examples.  (1)  Certain  characteristics  are  expected  to  occur 
with  frequencies  3:2:  1.  One  hundred  and  twenty  individuals  were 
examined  and  when  classified  according  to  the  characteristics,  the 
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distribution  was  found  to  68  :  34  :  18.  What  is  the  probability  of 
the  difference  being  due  to  chance  ? 


Observed 
freq.  (/) 

Expected 
freq.  (/„) 

( /  — /. )2 

L 

68 

60 

64/60  =  1-0666 

34 

40 

36/40  =  0-9000 

18 

20 

4/20  =  0-2000 

Total  120 

120 

X2  —  2-1666 

Here,  knowing  two  of  the  frequencies  we  can  fill  in  all,  so 
n'  =  2  -f  1  =  3 ; 
therefore  P  —  0-33; 

i.  e.  a  divergence  as  great  as  or  greater  than  this  might  be  expected 
once  out  of  every  three  times  as  a  mere  matter  of  chance. 

(2)  If  instead  of  examining  120  individuals  we  had  taken  1200 
and  had  found  the  same  proportions,  viz.  680  :  340  :  180,  can  we  say 
anything  further  as  to  the  difference  being  due  to  chance  ? 

In  this  case  will  be  found  to  be  equal  to  21-66.  For  n'  —  3, 
it  will  be  seen  that  the  value  of  P  is  extremely  small.  (Reference  to 
Pearson’s  table  will  show  that  its  value  is  0-000021 ;  so  that  a  diver¬ 
gence  as  great  as  or  greater  than  this  would  be  expected  twice  in 
ico, 000  times).  The  difference  is  probably  therefore  not  to  be 
explained  as  due  to  chance:  the  cause  should  be  sought  elsewhere. 

These  examples  emphasize  the  importance  of  collecting  as  full 
data  as  may  be  possible. 


Exercises. 

1.  Five  dice  were  tossed  together  100  times,  and  the  frequencies 
with  which  4  or  5  or  6  appeared  were  as  follows: 

No.  of  successes 

(4  or  5  or  6)  o  1  2  3  4  5  Total 

Frequency  1  13  31  31  14  10  100 

How  often  would  you  expect  such  a  divergence  from  the 
‘expected’  frequency  as  a  matter  of  chance  ? 

2.  Five  dice  were  tossed  50  times,  the  frequencies  of  the  appearance 
of  “4”  were : 

No.  of  successes 

(4)  012345  Total 

Frequency  17  20  12  1  o  o  50 

What  is  the  probability  of  the  divergence  from  expectation 
being  due  to  chance  ? 
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3.  When  drawing  eight  cards  from  a  pack  the  frequencies  with 
which  hearts  appeared  were: 

No.  of  successes  01234  5678  Total 

Frequency  8  28  31  21  10  2  00  o  100 

Does  it  seem  probable  that  each  card  was  drawn  from  the 
full  pack  ? 

12.  The  same  method  can  be  used  to  test  agreement 
between  two  or  more  observed  frequency  series. 

Suppose  that  two  people  K  and  L,  claiming  the  power 
of  thought  transference,  have  been  tested.  They  have  been 
provided  with  a  paper  on  which  one  hundred  squares  have 
been  ruled,  and  have  been  required  to  write  either  A  or  B, 
but  not  both,  in  every  square. 

Suppose  K  has  put  down  A  54  times  and  that  on  30 
of  these  occasions  L  has  also  put  down  A.  We  will  assume 
also  that  L  has  entered  ^51  times  altogether. 

From  these  data  we  can  construct  the  following  table, 
showing  the  results: 


No.  of  times 

No.  of  times 

K  has  written 

Total  of 

L  has  written 

A 

B 

L 

A . 

30 

21 

5i 

B . 

24 

25 

49 

Total  of  K . 

54 

46 

IOO 

Since  K  has  written  A  54  times,  he  has  written  it  on 
the  average  54/100  times,  so  that  whenever  L  writes  A  his 
chance  of  agreement  with  K  is  54/100.  Since  L  has  written 
A  51  times,  his  total  chance  of  agreement  with  K  is 
51  X  54/100  =  27-54.  Similarly  the  chance  of  both  K  and 
L  writing  B  in  the  same  square  is  46  X  49/100  =  22' 54. 
In  this  way  we  find  the  numbers,  to  be  expected  as  a  pure 
matter  of  chance,  are: 


Expected  No.  of  times 

L  will  write 

Expected  No.  of  times 

K  will  write 

A  |  B 

Total  of 

L 

A  . 

27-54 

23-46 

51-00 

B . 

26-46 

22-54 

49‘00 

Total  of  K . 

54"°9 

46-00 

100-00 
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From  this  we  get  the  following  calculation  of  /2 


Observed  freq. 
(/) 

Expected  freq. 
(/.) 

(f-fe) 2 
fe 

30 

27-54 

6-0516/27-54 

21 

23-46 

6-0516/23-46 

24 

26-46 

6-0516/26-46 

25 

22-54 

6-0516/22-54 

Total  100 

IOO 

%2  =  o-975 

Now  in  this  example,  since  we  know  the  totals  of  the 
A's  and  B' s  written  by  K  and  L,  fixing  one  frequency 
enables  us  to  calculate  the  remainder.  So  n'  =  i  +  1  =  2. 

From  which  we  find  P  =  0-33,  i.  e.  we  might  expect 
such  a  correspondence  33  times  out  of  every  hundred  ex¬ 
periments. 

Example.  In  certain  experiments  on  hearing*,  observers  reported 
that  they  sometimes  heard  two  sounds  and  sometimes  one  sound, 
while  sometimes  they  were  doubtful.  The  physical  conditions  were 
two,  A  and  B,  otherwise  conditions  were  identical.  Were  their  reports 
such  as  might  be  expected  as  a  mere  matter  of  chance  ? 

The  data  are  as  follows: 


Physical 

Reports  by  observers 

conditions 

Two  sounds 

One  sound 

Doubtful 

Total 

A . 

38 

105 

160 

B . 

21 

66 

4 

9i 

Total 

59 

171 

21 

251 

We  see  that  two  sounds  were  reported  on  59/251  occasions. 
Since  condition  A  occurred  160  times  it  is  probable  that,  as  a  mere 
matter  of  chance,  two  sounds  would  be  reported  under  condition  A 
on  160  X  59/251  occasions  =  37'6  times. 

Similarly,  as  a  mere  matter  of  chance,  since  one  sound  was 
reported  171  times,  the  number  of  times  one  sound  might  be  expected 
with  condition  A  is  160  X  171/251  =  109  times. 

From  these  two  expected  frequencies  we  can  calculate  the  rest, 
assuming  that  the  physical  conditions  are  not  significant.  The 
expected  frequencies  are: 


*  H.  Banister,  British  Journal  of  Psychology,  XVI,  1926,  291. 
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Two  sounds 

One  sound 

Doubtful 

Total 

Condition  A 

37'6 

109 

I3'4 

160 

Condition  B 

21  ‘4 

62 

7-6 

9i 

Total 

59 

171 

21 

251 

So  we  have 


Observed  freq. 
(/) 

Expected  freq. 
(/.) 

(/-/,) 2 
te 

38 

37'6 

•16/37-6 

105 

109-0 

16/109 

17 

i3'4 

12-96/  13-4 

21 

21-4 

•16/  21'4 

66 

62-0 

16/62 

4 

7-6 

12-96/7-6 

Total  251 

251 

X2  =  3'o88 

Since  two  frequencies  were  necessary  for  the  table  to  be  com¬ 
pleted  n'  =  2  i  =3;  whence  it  follows  that  P  =  0  2 15;  or 
such  a  deviation  might  be  expected  in  over  21%  of  cases  as  a  pure 
matter  of  chance.  It  is,  therefore,  not  probable  that  the  physical 
conditions  were  in  any  way  connected  with  the  recorded  hearing 
of  one  or  of  two  sounds*. 


Exercises. 

1.  Under  two  physical  conditions,  C  and  D,  the  following  were 
the  reports  of  the  numbers  of  sounds  heard.  What  is  your  opinion 
about  the  results  ? 


Physical 

No.  of  sounds  reported 

conditions 

Two 

One 

Doubtful 

Total 

C . 

36 

107 

19 

162 

D . 

84 

67 

24 

175 

Total 

120 

174 

43 

337 

*  It  is  often  difficult,  in  actual  practice,  to  decide  what  value 
of  P  should  be  taken  as  significant.  As  a  rough  rule  it  may  be  assum¬ 
ed  that  the  difference  is  not  due  to  chance  if  P  =  001  or  less;  but 
it  is  advisable  to  weight  one’s  deductions  against  one’s  hopes  or 
expectations. 
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2.  The  following  table*  gives  the  classification  of  6,684  in' 
dividuals  according  to  eye  and  hair  colour.  Does  the  distribution 
appear  to  be  due  to  chance  ? 


Eye  Colour 

Hair  Colour 

Fair 

Dark 

Total 

Light 

2,714 

3,129 

5,843 

Brown 

115 

726 

841 

Total 

2,829 

3,855 

6,684 

*  A.  Wolf:  Essentials  of  Scientific  Method,  p.  77. 


<D  985) 


CHAPTER  II 


Representative  Measures 

13.  When  discussing  results  it  is  usual  to  take  some 
measure,  actual  or  calculated,  as  a  representative.  The 
representatives  most  usually  taken  are  the  Mode;  the  Me¬ 
dian;  and  the  Arithmetic  Mean. 

14.  The  Mode  is  the  “most  fashionable”  measure,  in  the 
sense  that  it  occurs  most  frequently;  e.  g.  the  mode  of  the 
cuckoo’s  eggs  (Table  I)  is  about  22'0  mm.,  for  there  are 
most  eggs  of  this  size.  The  “most  fashionable”  school  of 
table  II  has  about  60  pupils  in  attendance.  These  values 
are  not  exact,  for  they  obviously  depend  on  the  classifi¬ 
cation  chosen.  A  more  exact  value  of  the  mode  is  obtained 
from  the  equation 

Mode  =  Mean  —  3  (Mean  —  Median). 

15.  The  Median  is  the  value  of  the  middle  measure  when 
the  measures  are  ranged  in  order  of  magnitude*.  For  the 
measures  1,  3,  5,  8,  10,  12,  15  the  median  is  8,  since  there 
are  three  measures  on  either  side.  For  an  even  number  of 
measures  (e.  g.  1,  3,  5,  8,  10,  12,  15,  17)  the  median  is  taken 
half  way  between  the  two  middle  measures  (8  and  10).  If 
the  measures  are  given  in  the  form  of  a  frequency  distrib¬ 
ution  the  median  is  obtained  in  a  similar  manner.  For 
Tables  I  and  II  the  medians  are  the  sizes  of  the  786/787  egg 
and  of  the  338/339  school.  These  are  22-5  mm.  and  140 
pupils  approximately,  if  we  judge  from  our  frequency  tables 
alone.  If  we  took  the  actual  eggs  and  schools  we  would 
probably  find  that  the  values  were  slightly  different  from 
these.  The  actual  size  of  the  median  school  is  133. 


*  W.  and  R„  p.  188. 
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16.  Both  these  representative  measures  are  used,  especially 
the  median.  But  the  most  useful  representative  is  the  Arith¬ 
metic  Mean.  (There  are  other  means,  but  they  will  not  be 
considered.)  The  value  of  the  arithmetic  mean  (m)  of  a  number 
of  quantities  ax,  a2,  a3,  a4,  &c.  is  obtained  by  summing  all 
the  quantities  and  dividing  this  sum  by  the  number  of  the 
quantities.  Algebraically  it  is  represented  by  5  (a)  jn,  where 
5  (a)  indicates  the  sum  of  all  the  quantities  a1,  a2>  a3,  &c.,  of 
which  there  are  n. 

For  example  the  mean  of 

(3.  2,  7,  —3,  6,  —5,  4)  is  14/7  =  2. 

17.  Two  important  facts  about  the  mean  are: 

1.  The  sum  of  the  differences  of  the  individual  measures 
from  the  mean  is  zero; 

2.  The  sum  of  the  differences  of  the  individual  measures 
from  some  value  (A)  other  than  the  mean  is  equal  to  n  times 
the  difference  (d)  between  this  other  value  (A)  and  the  mean. 

These  facts  may  be  expressed  in  algebraic  symbols  as 
follows : 

1.  If  xx,  x2,  x3,  &c.  arc  the  deviations  of  the  various 
measures  from  the  mean,  then 

xi  +  x2  +  x3  +  ■  •  •  =  o  ; 

Or  S  (x)  =  o,  where  S  (x)  means  the  sum  of  the  terms 

%X>  X2’  *^3 >  &C- 

2.  If  ( d )  is  the  difference  between  the  mean  and  some 
other  measure  {A)  and  if  £2,  £3,  &c.  are  the  differences 
between  the  various  measures  and  A,  then 

£1  +  £2  +  £3  +  •  •  •  =nd , 
or  S(£)  =nd, 
or  d  =zS(£)ln. 

This  last  equation  is  one  which  will  be  very  frequently 
used.  It  often  simplifies  the  calculation  of  the  value  of  the 
mean. 
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Examples.  1.  The  differences  between  2,  the  mean,  and  the 
various  measures  of  para.  16  are 

1;  o;  5;  —  5;  4;  —  7;  2. 

Their  sum  is  zero. 

2.  The  differences  from  a  value  of  A  =  4  are 

—  1;  —2;  3;  —7;  2;  —  9;  o . 

5(|)  =-14 
But  n  —  7 
Whence  d  —  —  2 
.*.  m  =  4 — 2 
=  2 


Exercises. 

1.  Find  the  mean  of  the  following  numbers: 

3;  8;  — 1;  o;  5;  6;  4;  —2;  2;  3. 

2.  Show  that  S  (•*■)  =  o. 

3.  Find  the  value  of  the  mean,  calculating  the  sum  of  the  differences 
from  the  value  (3),  and  applying  the  formula  S  (fj)/n  =  d. 

4.  Find  the  means  of  the  following  numbers: 

(a)  14;  1  •  1 ;  13;  o;  2  4;  23;  16;  08;  08 

(b)  07;  0-9;  o;  — o'2;  0  8;  — 07;  —0  3;  22;  20 

(c)  21;  2  0;  13;  — 02;  3  2;  16;  13;  3  0;  2  8 

5.  Find  the  means  of  the  following  numbers,  which  are  the  scores 
obtained  by  two  separate  groups  of  army  recruits  in  a  test  of 
shooting  ability,  putting  A  =  25: 

Group  I.  20;  20;  20;  21;  23;  23;  23;  23;  26;  26;  26;  27; 

27;  28;  28;  29;  29;  29;  29;  30;  30;  30;  30;  30; 

3H  32;  33;  33;  34;  36 

Group  II.  17;  19;  20;  20;  21;  22;  23;  25;  25;  26;  26;  27; 

30;  30;  33;  35 

18.  When  the  data  are  given  in  the  form  of  a  frequency 
distribution  the  mean  is  calculated  similarly. 

For  example,  find  the  mean  of  the  following: 

Length  of  line  cms.  io-o  io'i  io-2  io-3  io-4 

Frequency  5  15  13  11  6 
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Here  we  have  a  length  of  io  cms.  five  times,  so  the  total 
length  of  these  five  lines  is  50  cms.  Similarly  for  the  other 
frequencies. 

It  is  usual  to  set  out  the  data  thus : 


Length 

(cms.) 

Freq. 

Freq.  X  Length 

IO 

5 

50 

IO'I 

15 

1515 

10*2 

13 

1326 

103 

II 

1133 

IO4 

6 

62-4 

Total 

50 

5098 

.'.  Mean  =  io'igG  cms  . 

The  value  of  the  mean  might  also  have  been  calculated 
by  finding  the  sum  of  the  differences  from  some  value, 
e.  g.  io’ 2,  and  applying  the  formula  d  —  S  (£)/w. 

We  then  have 


5 

/ 

/x£ 

-  02 

5 

-  IO 

- 0*1 

15 

—  I‘5 

O 

13 

O 

OI 

I  I 

I'l 

0*2 

6 

1*2 

Total 

50 

=  02 

.'.d  =  —  0‘004,  and  Mean  =  io-2  —  0^004  =  io'ig6  cms. 

19.  Sometimes  it  greatly  facilitates  arithmetical  work 
if  “Class  Intervals”  are  used  in  place  of  the  actual  class 
designation.  For  example  if  we  wish  to  calculate  the  mean 
length  of  the  cuckoo’s  eggs  from  the  data  of  table  I  we 
might  designate  the  19  mm.  class  as  Class  1 ;  the  19.5  mm. 
class  as  Class  2,  &c. 
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The  value  of  the  mean  is  then  obtained  as  follows: 


Class 

Freq. 

Freq.  X  Class 

I 

I 

I 

2 

3 

6 

3 

33 

99 

4 

39 

156 

5 

156 

780 

6 

152 

912 

7 

392 

2744 

8 

288 

2304 

9 

286 

2574 

IO 

IOO 

IOOO 

II 

86 

946 

12 

21 

252 

13 

12 

156 

14 

2 

28 

15 

O 

O 

l6 

I 

16 

Total 

1572 

ri974 

Mean  —  11974/1572  =  7T2  class  intervals  . 

i.  e.  the  mean  is  0.62  of  a  class  interval  beyond  the 
middle  of  the  seventh.  Now  a  class  interval  is  0’5  mm. 

o'62  class  interval  —  o'3i  mm. 

.3  the  mean  is  22‘3i  mm  . 

N.  B.  Great  care  has  always  to  be  taken  in  converting 
class  intervals  back  to  the  actual  measures.  If  the  mean 
came  at  the  4' 13  class  interval,  then  the  mean  is  beyond 
the  middle  of  the  fourth  interval,  whatever  that  may  be,  by 
an  amount  o-i3  of  a  class  interval. 

20.  The  arithmetical  work  would  have  been  further 
simplified  if  the  distance  of  the  mean  from  the  seventh 
class  had  been  calculated.  The  sixth  class  would  then  be 
put  —  1 ;  the  fifth  class  would  be  —  2 ;  &c. ;  while  the  eighth 
class  would  be  +  1;  &c.,  so  that  the  larger  frequencies  are 
multiplied  by  the  smaller  numbers. 

As  an  example  the  value  of  the  mean  school  of  Table  II, 
taking  distances  from  the  fourth  class,  in  which  the  average 
attendance  is  115,  may  be  calculated. 
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Class  difference 

from  class  4 

1 

Freq. 

(/) 

(/> 

+ 

:f) 

(/x£2)* 

—  3 

18 

54 

162 

-  2 

147 

294 

588 

-  I 

138 

138 

138 

O 

H5 

O 

O 

I 

78 

78 

78 

2 

46 

92 

184 

3 

35 

105 

315 

4 

26 

IO4 

416 

5 

21 

105 

525 

6 

12 

72 

432 

7 

IO 

70 

490 

&c. 

&c. 

&c. 

&c. 

Total 

676 

956- 

-486 

7328 

d  =  (956  —  486)  I6j6 
=  o' 695  class  intervals 
=  27'8  units, 
mean  =  140  -f  27'8 
— i67'8 

Hence  the  mean  school  is  one  with  167' 8  pupils  in 
attendance. 


Exercises. 

1.  Complete  the  work  in  the  example  above. 

2.  Find  the  value  of  the  mean  egg,  para.  19,  by  calculating  the  dis¬ 
tance  of  the  mean  from  the  eighth  class. 

3.  Exercises  of  para.  11, 

a)  What  was  the  mean  number  of  successes  when  tossing  the 
five  dice?  Take  distances  from  the  fourth  class  in  1,  and 
from  the  first  class  in  2. 

b)  What  was  the  mean  number  of  hearts  ? 

4.  What  are  the  mean  ages  at  death  of**  American  and  other  psycho- 
logists  ? 


*  This  column  will  be  explained  later,  vide  para.  28. 

**  E.  G.  Boring,  Psychological  Bulletin,  XXV,  1928,  p.  302. 
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Age  at  death 

No.  of  recorded 
American  psychologists 

deaths  of 

Other  psychologists 

30  - 

I 

O 

35  — 

7 

3 

40  — 

5 

I 

45  — 

2 

2 

50  — 

9 

3 

55  — 

5 

2 

60  — 

4 

4 

65  — 

6 

5 

7°  — 

5 

4 

75  ~ 

3 

IO 

SO  - 

3 

7 

85  — 

O 

4 

90  - 

O 

2 

Total 

50 

47 

5.  1,434  prawns  were  examined*  and  found  to  have  dorsal  teeth 
as  follows: 

2  prawns  had  1  dorsal  tooth  each 

23  „  ,,  2  ,,  teeth  each 

103  „  ,,  3 

533  ..  4 

681  ,,  ,,  5 

89  „  ,,  6  ,, 

3  ><  ■>  7  » 

What  was  the  average  number  of  dorsal  teeth  ? 


*  A.  Wolf,  Essentials  of  Scientific  Method,  p.  37. 
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Measures  of  Dispersion 

21.  In  the  biological  sciences  it  is  difficult  to  attach  any 
precise  significance  to  any  mean,  stated  alone,  since  the 
variations  in  the  individual  observations  may  be  very  great. 
Even  when  the  whole  series  of  observations  is  given  —  and 
this  should  always  be  done  if  possible  —  it  is  useful  to  in¬ 
dicate  concisely  their  scatter  or  “dispersion”.  There  are 
various  measures  of  dispersion.  The  chief  are  the  Quartiles, 
the  Mean  Variation,  and  the  Standard  Deviation. 

22.  The  Quartiles*.  If  the  whole  observations,  arranged 
in  order  of  magnitude,  are  divided  up  into  four  equal  portions 
(numerically)  the  values  of  the  three  observations  which 
mark  the  divisions  are  the  Quartiles.  The  second  quartile 
is  the  Median,  and  the  values  of  the  first  and  third  quartiles 
obviously  give  some  indication  of  the  variations  found :  they 
denote  the  limits  between  which  half  the  observations  lie. 
The  smaller  the  Inter-quartile  Range,  i.  e.  the  difference 
between  the  values  of  the  first  and  third  quartiles,  the 
greater  the  reliance  which  can  be  placed  on  the  median. 

When  the  distribution  is  normal,  the  median  is  the  mean. 
Half  the  interquartile  range,  is  0'6745  (standard  deviation), 
—  the  standard  deviation  will  be  discussed  later,  para.  24 
et  seq.  So,  for  rough  laboratory  purposes,  the  standard 
deviation  can  be  obtained  from  the  quartiles. 

Example.  In  the  data  of  Table  I  the  first  and  third  quarti.es  are 
obviously  about  2i'8  and  22-9  Therefore  i'i  =  2  X  0  6745  a  (a  is 
the  usual  sign  for  the  standard  deviation) ; 

or  a  =  o-8i  mm. 


*  W.  and  R.,  p.  185. 
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Exercise. 

Data  of  Table  II,  para.  3.  From  the  quartile  readings,  what  is 
the  value  of  the  standard  deviation  ? 

23.  The  Mean  Deviation*,  or  Mean  Variation,  (m.  v.),  as 
it  is  often  called  in  psychological  text  books,  is  the  mean 
of  the  deviations  of  the  individual  observations  from  their 
arithmetical  mean.  The  sign  of  the  deviation,  whether  po¬ 
sitive  or  negative,  is  neglected. 

Example.  Find  the  mean  deviation  of  the  numbers 
3.  2,  7,  —3,  6,  —5,  4,  (para.  16). 

The  mean  is  2.  So  the  deviations  of  the  individual  observations 

are. 

1,  o,  5,  5,  4,  7,  2, 

Of  which  the  mean  is  24/7  =  3429 

Therefore  the  m.  d.  is  3-429 

Exercise. 

Find  the  mean  deviations  of  the  numbers  given  in  exercise  1  of 
para.  17:  3,  8,  —  1,  o,  5,  6,  4,  —2,  2,  3; 

and  of:  2,  —3,  6,  9,  8,  9,  12,  — 7. 

24.  Standard  Deviation  (a)**.  This  is  the  most  usual 
measure  of  the  dispersion  of  the  various  observations, 
because  it  lends  itself  to  mathematical  treatment.  The 
mean  variation  is  the  average  difference  between  the  various 
observations  and  their  mean;  the  standard  deviation  is 
obtained  by  taking  the  squares  of  the  differences  from 
the  mean,  finding  their  average,  and  then  taking  the  square 
root  of  the  average:  i.  e;  s.  d.  (a)  =  }S  ( x2)/n ,  where  S  ( x 2) 
is  the  sum  of  the  squares  of  the  differences  between  the 
individual  observations  and  the  mean. 

Example.  Find  the  s.  d.  of  the  data  of  the  example  in  para.  23. 

The  squares  of  the  differences  from  the  mean  are 

1,  o,  25,  25,  16,  49,  4; 

Therefore  the  s.  d.  =  4120/7  =  417-14 
_  =  4'1 

*  W.  and  R.,  p.  184. 

**  W.  and  R.,  p.  183. 
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N.  B.  The  s.  d.  is  a  kind  of  average  of  the  deviations,  and  must 
lie  somewhere  between  the  greatest  and  the  least:  e.  g.  if  by  some 
blunder  a  value  8-3  had  been  arrived  at  in  this  example  it  would 
obviously  be  wrong,  for  no  deviation  exceeds  7.  Three  deviations 
are  greater,  and  four  are  less,  than  the  true  value  of  a,  4-1.  In  a  long 
series,  about  two  third  of  the  deviations,  roughly,  will  be  less  than  a. 
The  value  obtained  for  a  should  always  be  examined  to  see  whether 
it  looks  reasonable. 

Exercises. 

1.  Find  the  s.  ds.  of  the  exercises  of  para.  23,  viz.: 

(a)  3,  8,  — 1,  o,  5,  6,  4,  — 2,  2,  3; 

(b)  2,  —3,  6,  9,  8,  9.  12,  —7. 

2.  Find  the  s.  ds.  of  exercises  4,  a,  b,  c  of  para.  17. 

25.  It  will  be  noted  that  the  question  of  the  sign  of  the 
difference  from  the  mean  is  overcome  in  this  case  by 
squaring.  Also  that  the  s.  d.,  as  compared  with  the  m.  v., 
gives  more  importance  or  “weight”  to  those  observations 
which  differ  largely  from  the  mean,  so  that  the  s.  d.  is  greater 
than  the  m.  v.  As  a  rough  rule  the  m.  v.  =  o-8  a,  a  relation 
which  is  of  use  when  checking  calculations. 

26.  If  differences  are  taken  from  some  measure  (A)  other 
than  the  mean,  it  can  easily  be  shown  that  the  sum  of  the 
squares  of  the  differences  so  obtained  is  greater  than  the 
sum  of  the  squares  of  the  differences  from  the  mean  by 
nd2,  where  n  is  the  number  of  observations  and  d  is  the 
difference  between  (.4)  and  the  arithmetic  mean,  (vide 
para.  17). 

Adopting  the  symbols  of  para.  17  we  have 
S(x2)  =S(?)-nd>, 

But 

no2  =  S(x2),  by  para.  24, 

Therefore 

no 2  =  S  (£2)  — -  nd2 

Example.  Find  the  s.  d.  of  the  numbers 

3,  2,  7,  —3,  6,  —5,  4; 

calculating  from  a  value  (A)  —  3. 

It  is  useful  to  make  all  calculations  of  s.  d.  in  the  form  given 
below. 
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Readings 

l 

+ 

I8 

3 

. 

# 

2 

• 

I 

I 

7 

4 

• 

l6 

—  3 

. 

6 

36 

6 

3 

9 

—  5 

. 

8 

64 

4 

I 

I 

Total  14 

8 

15 

127 

Taking  differences  from  ( A )  =3,  we  have 
d  =  S(g)/n  =  — 7/7 

whence  we  have,  mean  =3  —  1  =  2 

and  na2  =  S(£2) —  nds 

=  127  —  7  =  120 
•••  0  =  T  120/7  =4'i. 

which  is  the  result  we  obtained  previously,  vide  para.  24. 

Exercises. 

1.  Find  the  s.  d.  of  each  of  the  following  series  of  numbers: 

(a)  3,  8,  —1,  o,  5,  6,  4,  —2,  2,  3, 

calculating  from  A  =3; 

(b)  2,  —3,  6,  9,  8,  g,  12,  —7, 

calculating  from  A  =  4; 

(c)  5,  7.  —4.  3,  —5,  4.  2,  —3,  —4.  5. 

calculating  from  A—  4. 

2.  Find  the  s.  ds.  of  the  data  of  the  shooting  tests,  vide  exer¬ 
cise  5,  para.  17. 

27.  In  a  frequency  distribution,  it  is  evident  that  every 
member  of  a  class  at  a  distance  £  from  the  value  (A)  will 
contribute  the  same  amount  £2  to  the  total.  If  there  are 
/  members  in  the  class,  the  total  contribution  of  the  class 
will  be  (/|2).  The  s.  d.  will  therefore  be  found  by  sub¬ 
stituting  S(f£2)  for  S(£2).  That  is,  the  formula  of  para.  26 
becomes 

no2  —  S(f£2)  —  nd 2. 
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The  following  example,  a  continuation  of  the  one  worked 
in  the  latter  part  of  para.  18  (q.  v.),  will  be  easily  followed: 

5  (/|2)  =  o'2  +  o'i5  +  o-ii  +  o‘24 

—  070 

But  d2  —  o’ooq2  =  o-ooooi6 
Therefore  no2  =  070  —  o"ooo8,  since  n  —  50 
=  0-6992 

g2  —  0-6992/50  =  0-013984 
a  =  o'ii8 

28.  In  many  cases  arithmetical  work  is  simplified  if 
class  intervals  are  used,  and  if  differences  are  taken  from 
the  largest  class.  The  last  column  of  para.  20  will  now  be 
understood. 

It  will  be  seen  that  5  (/£2)  =  7328 
Now  d  —  0-695 
.*.  d 2  —  o-4830 
no2  —  S  (/f2)  —  nd2 

—  7328  —  676  X  o'4830 
a2  =  107572 
u  — 3'22  class  intervals 
=  128-8  pupils. 


Exercises. 

1.  Find  the  s.  d.  of  exercises  2  and  3  at  the  end  of  para.  20. 

2.  Find  the  s.  ds.  of  the  data  of  exercise  4,  para.  20. 

3.  Find  the  s.  ds.  of  the  data  of  exercise  5,  para.  20. 

29.  We  have  seen  in  para.  7  that  all  normal  frequency 

distributions  are  similar.  The  maximum  frequency  is  at 
the  middle,  and  the  value  of  this  frequency  is  the  mean. 
As  we  proceed  farther  and  farther  from  the  mean  the  fre¬ 
quencies  decrease  symmetrically  on  either  side  of  the  mean : 
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in  other  words,  the  number  of  individuals  which  are  greater 
than  the  The  an  by  a  certain  amount  (x)  is  exactly  equal 
to  the  number  of  individuals  which  are  less  than  the  mean 
by  the  same  amount  (x).  Further,  it  is  found,  that,  if  the 
variations  from  the  mean  are  measured  in  terms  oj  the  s.  d. 
(a),  the  percentage  number  of  individuals  which  vary  from  the 
mean  by  a  certain  amount  is  the  same  for  all  normal  distrib¬ 
utions. 

So,  if  these  percentages  have  been  once  obtained,  the 
number  of  individuals  which  are  included  within  a  range 
varying  by  (x)  from  the  mean  can  be  determined  by  divid¬ 
ing  the  value  of  (x)  by  the  value  of  (a),  to  bring  it  to  units 
of  (a),  and  looking  up  the  percentage  in  the  table.  These 
percentages  have  been  calculated  by  Prof.  K.  Pearson*, 
and  are  shewn  graphically  in  Appendix  II. 

The  graph  shows,  for  example,  that  477%  of  the  ob¬ 
servations  are  included  between  the  mean  and  a  distance 
(xfo  =  2)  to  one  side  of  the  mean;  that  over  49%  fall  between 
the  mean  and  a  distance  (xfa  =  3) ;  while  25  %  are  within  a 
distance  (x/a  =  0.67)  to  one  side  of  the  mean.  This  distance 
is  the  probable  error**  (p.  e.).  Fifty  per  cent  of  the  obser¬ 
vations  lie  within  the  range  ( xfa  =  ±  0-67)  since  twenty  five 
per  cent  lie  on  either  side  of  the  mean. 

Example.  In  a  series  of  1500  observations,  which  follow  the  normal 
distribution,  the  mean  is  837  and  the  s.  d.  n.  How  many  observations 
are  probably  greater  than  859  ? 

Here  *  =  859  —  837  =  22; 

*/cr  =  2. 


From  Appendix  II  we  see  that  477%  of  the  observations  lie 
between  the  mean  and  xfa  =  2. 

Since  50%  of  the  observations  are  less  than  the  mean,  it  follows 
that  2-3%  are  greater  than  859; 

i.  e.  23  X  1500/100  =  34  5  observations  will  be  greater  than  859. 


*  K.  Pearson:  Tables  for  Statisticians  and  Biometricians ;  Table II. 

**  So  called  because,  if  any  observation  is  taken  at  random,  the 
chances  are  equal  that  it  will  lie  within  or  without  this  distance  of 
the  mean.  (W.  and  R.,  p.  184.) 
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Exercises. 

1.  If  the  s.  d.  of  a  series  of  observations  is  8,  what  percentage, 
of  the  observations  may  be  expected  to  differ  from  the  mean 
by  over  12  ? 

2.  The  mean  of  a  series  is  95  and  the  s.  d.  is  10.  What  percentage 
of  the  observations  will  probably  be  over  90  ? 

3.  20%  of  a  series  of  observations,  whose  mean  is  850,  are  greater 
than  858-5  units.  What  is  the  s.  d.  of  the  series  ? 

4.  What  are  the  p.  es.  of  the  data  of  exercises  2  and  3  of  para.  28  ? 


CHAPTER  IV 


The  significance  of  the  mean 

30.  If  a  series  of  (n)  observations  of  a  given  variable 
have  a  mean  (m),  it  does  not  follow  that  the  same  value 
of  the  mean  will  be  obtained  if  another  series  of  n  observa¬ 
tions  be  taken  at  random  from  the  same  population.  The 
means  so  obtained  will  vary,  though  the  variation  to  be 
expected  will  be  less  the  greater  the  number  of  observations 
taken.  It  is  usual  to  express  the  deviations  between  the 
means  in  terms  of  the  “Standard  Error”  (s.  e.)  of  the  mean, 
which  will  be  denoted  by  the  symbol  (am). 

It  has  been  found  that 

am  =  a  /  hi  ,* 

where  (a)  is  the  s.  d.  of  the  population.  (am)  is,  obviously, 
smaller  than  (a).  When  the  s.  d.  of  the  population  is  not 
known,  it  is  customary  to  use  the  s.  d.  of  the  (n)  observations 
instead.  This  does  not  introduce  an  appreciable  error  if  (n) 
is  fairly  large:  the  case  when  (n)  is  small,  will  be  dealt 
with  in  para.  32. 

The  methods  used  in  the  previous  chapter  for  apprising 
the  dispersion  of  observations  which  follow  the  normal 
distribution  may  be  applied  to  the  variations  to  be  expected 
in  the  means,  for  the  means  will  follow  the  normal  distri¬ 
bution  more  closely  than  the  individual  observations  do.  In 
applying  the  formula  x  will  be  the  variation  of  the  mean 
of  n  observations  from  the  true  mean,  and  am  will  be  the 
measure  of  the  dispersion. 

Examples.  1.  The  mean  of  64  observations  is  2,  and  the  s.  d.  is 
0-5.  Is  the  mean  significant? 


* 


W.  and  R.,  top  of  p.  184. 
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The  mean  will  not  be  significant  if  there  is  much  probability 
of  it  being  zero.  The  s.  e.  of  the  mean  (am)  =  cr/y/w 

=  2A 
8 


=  006 


The  chances  of  the  mean  being  zero  will  be  obtained  from  the 
value  of  x/am  when  x  —  i  and  om  —  o-o6. 


xja 


m 


2 

- -  =  over  33 

0.00 


From  Appendix  II  we  see  that  the  probability  of  this  occurring 
as  a  matter  of  chance  is  very  minute,  so  we  may  conclude  that  the 
mean  is  significant. 

2.  The  s.  d.  of  a  large  series  of  observations  is  5  units.  If  we 
take  16  observations  at  random,  what  are  the  chances  that  their 
mean  will  vary  from  the  true  mean  by  more  than  2  units  ? 


s.  e.  of  mean  (crm)  =  5/}' 1  &  =  125 

When  x  —  2, 

*/am  =  2/1-25  =  1 ’6 

From  Appendix  II  we  see  that  44%  of  observations  will  lie  within 
this  distance  of  the  true  mean,  so  6%  will  have  a  greater  value  than 
this.  Similarly  6%  will  be  less  than  2  units  smaller  than  the  true 
mean. 

Hence  the  chances  are  12  :  88,  i.  e.  about  1  :  7  that  the  mean 
of  16  observations  will  vary  from  the  true  mean  by  more  than  2  units. 

3.  The  s.  d.  of  a  series  of  observations  is  10  units;  the  mean 
about  80.  How  many  observations  should  be  taken  to  be  fairly  sure 
that  the  mean  is  correct  to  2  %  ? 

(“Fair  certainty”  that  the  variation  does  not  occur  will  be  attained 
if  x/am—  3  units — the  chances  will  be  less  than  1%  that  the  mean 
of  the  sample  varies  from  the  true  mean  by  an  amount  greater 
than  x.) 

The  error  allowable  in  the  mean  of  the  sample,  i.  e.  x,  is 
2%  =  i-6  units. 

Now  if  n  observations  are  taken  the  s.  e.  of  the  mean  (am)  =  10/]/  n. 
The  minimum  value  of  n  will  be  that  which  makes 

x/<?m  =  3 

/ 10 

1.  e.  i-6/crm  =  3;  i.  e.  i-6/-=  =  3 ; 

/  }  n 


or  16  }  n  =  30  ; 


or  n=  352  ; 

So  at  least  352  observations  should  be  taken  if  we  wish  the  mean 
to  be  correct,  with  fair  certainty,  to  2%. 
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Exercises. 

1.  Are  the  means  obtained  for  the  ages  of  psychologists  at  death 
(para.  20,  exercise  4),  and  for  the  shooting  test  scores  (para.  17, 
exercise  5)  significant  ? 

2.  Data  of  table  I  (Cuckoo’s  eggs). 

The  s.  d.  of  a  series  is  '97;  and  the  mean  is  22-31.  What  are  the 
chances  that  the  mean  of  100  eggs,  collected  at  random,  does 
not  differ  from  the  true  mean  by  more  than  1  %  ? 

3.  The  s.  d.  of  a  series  is  4  units.  How  many  observations  must  be 
taken  to  be  fairly  sure  that  the  mean  is  correct  to  one  unit  ? 

4.  The  s.  d.  of  a  series  is  4  units;  the  mean  is  about  30.  What  number 
of  observations  will  give  a  mean  fairly  correctly  to  5%? 

5.  Data  of  exercise  5,  para.  20.  If  100  other  prawns  taken  at 
random  had  a  mean  of  5-5  teeth,  would  you  consider  they  belonged 
to  the  same  population  ? 

31.  The  above  method  of  testing  the  significance  of 
the  mean  holds  when  the  s.  d.  of  the  population  is  known, 
and  to  a  large  extent  also  when  the  s.  d.  of  a  fairly  large 
sample  only  is  known,  and  is  used  as  if  it  were  the  true 
s.  d.  of  the  population.  But  when,  as  so  often  happens, 
the  data  are  meagre,  there  is  more  risk  in  assuming  the 
s.  d.  of  the  sample  is  equal  to  that  of  the  “population”. 

32.  A  similar  method  has,  however,  been  evolved 
for  dealing  with  small  samples.  The  main  difference  is 
that,  in  place  of  the  true  s.  d.  of  the  observations 
[a  =  )'S  ( x2)jn ),  a  greater  value  is  ascribed  to  the  s.  d.  of 
the  sample  by  dividing  S  (x2)  by  (n  - — 1)  instead  of  by  (n), 
and,  instead  of  using  the  table  for  ( x/a ),  a  table  for  (x/as), 
where  os  is  the  s.  d.  of  the  sample  calculated  as  above,  has 
to  be  consulted.  The  table,  in  a  slightly  different  form, 
is  given  in  Fisher’s  Statistical  Methods  for  Research  Workers 
(ist  Edition) ;  Table  IV;  and  in  Biometrica,  Vol.  VI,  1 — 25; 
Vol.  IX,  414 — 417. 

The  values  of  x/as  between  which  and  the  mean  49- 5% 
of  the  frequencies  lie,  is  shewn  graphically,  in  Appendix  III, 
for  various  values  of  n.  For  “fair  certainty”  at  least  these 
values  of  x/a3  are  necessary. 

Example.  The  mean  of  nine  observations  taken  at  random  is  45; 
the  s.  d.  of  the  sample  (as)  is  10.  (a)  Is  the  mean  significant  ?  (b)  What 
difference  in  the  value  of  the  mean  may  be  taken  as  fairly  significant  ? 

(D  985) 


j 
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From  Appendix  III  we  see  that,  for  n  =  9,  a  value  of  x/a,  =  3'35 
or  greater  will  occur  about  once  per  cent,  since  99%  of  observations 
lie  within  the  range  x/Og  —  db  3  35- 
(a)  The  s.  e.  of  the  mean  is 

/r  io 
<L/)W  =  —  =3'3 

mean  45 

- =  — —  =  132, 

s.  e.  of  mean  3  3 


which  is  significant. 

(b)  If  x  is  the  significant  difference  in  the  mean,  it  must  be 
such  that 


x 

s.  e.  of  mean 


=  335  as  a  minimum; 


i.  e.  x  =  335  X  33 


=  11  16  as  a  minimum. 


Exercises. 

1.  The  following  numbers  express  the  frequency  of  stomata  on  the 
upper  surface  of  leaves  of  a  certain  plant.  What  is  the  value 
of  crs,  and  what  are  the  probable  limits  between  which  the 
true  value  of  the  mean  lies  ? 

77;  7'3;  77;  7'°;  7-2;  7-6;  75;  7-1 ;  7^4;  7-8. 

2.  Data  of  exercise  4,  para.  17. 

(a)  Calculate  aa  for  each  of  the  three  sets  of  observations; 

(b)  Are  the  means  significant  ? 


CHAPTER  V 


The  significance  of  the  difference  between  the 
means  of  two  samples 

33.  It  is  easy,  vide  para.  30,  to  calculate  the  significance 
of  the  mean  of  a  sample  (that  is,  the  closeness  of  its  approxi¬ 
mation  to  the  true  mean)  if  we  know  the  s.  d.  of  the  series. 
But  often,  when  two  samples  are  taken,  they  have  not  the 
same  s.  d. ;  they  may  not  belong  to  the  same  population 
and  yet  it  is  desired  to  know  the  significance  of  the  differ¬ 
ence  (dm)  between  the  means  of  the  two  series. 

34.  This  significance  can  be  found  if  we  know  the  varia¬ 
tions  which  are  to  be  expected  between  the  differences  of 
the  means  of  the  series  of  samples.  For  as  the  means  of 
different  samples  vary  at  random  between  certain  limits, 
so  the  differences  between  the  means  will  vary.  Sometimes 
the  difference  between  the  means  will  be  small,  sometimes 
it  will  be  large.  It  is  obviously  possible  to  have  some 
measure  of  the  variation, — the  standard  error  of  the 
difference  of  the  means. 

The  standard  error  of  the  difference  of  the  means  (<rd) 
will  obviously  vary  with  the  magnitudes  of  the  standard 
errors  of  the  means  of  the  two  series  (a,m)  and  (omt).  It  is, 
in  fact,  given  by  the  equation: 


K)2=  KJ2  +  KJ2 


When  the  standard  error  of  the  difference  of  the  means 
is  known,  the  significance  of  the  actual  differences  can  be 
obtained,  as  before,  from  the  graph  of  Appendix  II. 

Example.  What  is  the  significance  of  the  difference  between  the 
means  of  the  following  series  ? 


Series 


Mean 

80 

73 


a  No.  of  observations 


A 

B 


4 

45 


100 


81. 
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o 


0 


The  s.  e.  of  the  mean  of  A=  4/|  100  =  0-4 

The  s.  e.  of  the  mean  of  B  =  4  5/|  81  =  0-5 

Therefore  (s.  e.  of  diff.  of  means)2  =  o'42  -)-  0-5® 

=  016  -|-  0  25 
=  0-41 

s.  e.  diff.  =  ]'0'4i  =  0-64 
Difference  between  means  7 
*  ’  s.  e.  of  diff.  between  means  0  64 

=  1 1  approx. 

From  Appendix  II  we  see  that  the  probability  of  such  a  difference 
being  found  is  extremely  small,  so  that  this  difference  is  probably 
not  due  to  chance. 


Exercises. 

1.  20  observations  have  mean  =50;  s.  d.  =  2.32 
25  observations  have  mean  =48;  s.  d.  =  2.48 

Is  the  difference  likely  to  be  due  to  fluctuations  of  sampling  ? 

2.  Intelligence  tests  on  two  groups,  boys  and  girls,  give  the  following 
results.  What  is  the  probability  that  the  difference  is  due  to 
chance  ? 

Girls:  Mean  84  s.  d.  10  number  121 
Boys:  Mean  81  s.  d.  12  number  81. 

3.  Single  leaves  taken  at  random  from  different  plants  are  examined, 
with  results  as  below.  Is  the  difference  in  stomata!  frequency 
significant  ?  What  difference  is  fairly  significant  ? 

Mean  of 

No.  of  plants  stomatal  s.  d.  of  freq.  Treatment  of  plants 
freq. 

100  7-95  o-22  medium  watering 

121  7-10  0  25  little  watering 

4.  Is  the  difference  between  the  mean  ages  at  death  of  American 
and  other  psychologists  statistically  significant  (exercise  4, 
para.  20)  ? 

5.  Is  the  difference  between  the  means  of  the  scores  of  the  two 
groups  of  recruits  statistically  significant  (exercise  5,  para.  17)  ? 

35.  If  the  number  of  observations  is  small,  it  is  better  to 
calculate  the  s.  d.  for  the  sample  (as)  by  dividing  S  (x2)  by 
( n  —  1)  instead  of  by  (n).  The  significance  can  then  be 
obtained  by  reference  to  Appendix  III.  In  this  case  the  value 
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of  n  to  be  taken,  when  referring  to  the  graph,  is  one  less 
than  the  total  number  of  observations. 

Example.  The  data  of  exercise  1  of  para.  32  refer  to  the  stomatal 
frequency  in  a  plant  which  had  been  watered  copiously  every  day. 
In  another  plant,  given  a  medium  amount  of  water  every  day, 
the  stomatal  frequency  was  as  noted  below.  Is  the  difference  in 
stomatal  frequency  significant  ? 

7-6;  80;  7-8;  7-9;  8-3;  7-9;  81;  8-2;  77;  8-o; 

It  will  be  found  that  mean  =  7-95 
and  that  crs  =  | I S  [x2)/(n —  1)  =0-22 
.•.  s.  e.  of  mean  =  0-22  ^  n  =  0-07 
.•.  s.  e.  of  difference  of  means  =  (,  o'o82 -|- 0’072  =  o’io7. 

Now  difference  of  means  =  074 
diff.  of  means  074 

s.  e.  of  diff.  of  means  0107  ^  ’ 

For  n  =  20 —  1  =  19  a  value  of  x/as  which  is  greater  than  3-88 
is  fairly  significant;  it  will  not  occur  more  than  once  in  a  hundred 
times.  So  we  may  conclude  that  the  difference  is  significant. 

36.  It  is  not  permissible  on  the  data  as  given,  to  conclude 
that  the  differences  in  the  stomatal  frequencies  in  the  above 
example  are  in  any  way  related  to  the  changes  in  the  amount 
of  water  given  to  the  plant.  For  that,  it  would  be  necessary 
to  show  that  the  stomatal  frequency  in  many  plants  grown 
with  much  water  was  significantly  different  from  the  sto¬ 
matal  frequency  in  other  plants  watered  with  a  medium 
amount  of  water. 

If,  instead  of  considering  one  plant  only,  leaves  from 
100  “well-watered”  plants  and  from  100  plants  which  had 
been  given  a  medium  amount  of  water  had  been  examined, 
and  if  the  plants  had  been  chosen  at  random,  and  if  the  leaves 
had  also  been  chosen  at  random,  and  if  all  the  conditions 
were  identical  for  the  plants  excepting  only  the  amounts 
of  water,  we  should  be  justified  in  drawing  the  conclusion 
that  the  differences  in  the  stomatal  frequencies  were  signi¬ 
ficant  and  were  probably  due  in  some  way  to  the  different 
treatment  to  which  the  plants  had  been  subjected.  The 
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data  would  then  have  been  treated  as  in  para.  34.  If  leaves 
from  a  few  plants  only  were  compared  the  method  of  para.  33 
would  be  used. 


Exercises. 

1.  The  diameters  of  single  roots  from  five  plants  living  on  land  and 
from  five  similar  plants  living  in  water,  are  measured  at 
points  mid-way  between  the  tip  and  the  point  of  attachment  to 
the  stem.  Assuming  the  roots  are  chosen  at  random,  is  the  dif¬ 
ference  significant  ? 

Diameter  of  root. 

Land  Form  335;  337;  358;  383;  362. 

Water  Form  385;  412;  420;  418;  380. 

2.  Data  of  exercises  4(b)  &  (c)  of  para.  17.  Is  the  difference 
between  the  means  significant  ? 


CHAPTER  VI 

Correlation 

37.  Different  characteristics  of  individual  members  of 
a  given  population  often  vary  together:  there  is  some 
relation,  more  or  less  exact,  between  them*.  For  example, 
older  children  are,  as  a  rule,  taller  than  younger  children. 
This  is  not  always  true,  but  as  a  general  statement  the 
stature  of  a  child  is  related  to  his  age.  Similarly  the  weight 
of  a  bullock  varies  with  his  shoulder  girth,  and  the  egg- 
producing  capacity  of  a  hen  may  vary  with  the  special 
feeding  it  is  given,  and  so  on. 

38.  The  closeness  with  which  the  variables  are  related 
may  be  expressed  numerically.  This  numerical  expression 
is  called  a  coefficient  of  correlation  (r).  There  are  various 
correlation  coefficients.  We  shall  consider  the  one  which 
holds,  strictly,  only  when  there  is  linear  regression**.  It 
has  all  possible  values  from  +  1  to  —  1.  The  value  -f-  1 
indicates  that  the  one  variable  always  increases  by  the  same 
amount  for  unit  increase  of  the  other,  and  decreases  similarly 
as  the  other  decreases.  Where  this  correlation  coefficient  is 
found  there  are  no  exceptions.  A  value  of  —  1  indicates 
that  an  increase  in  one  variable  is  always  connected  with 
a  decrease  in  the  other,  and  vice-versa.  If  (r)  is  zero,  the 
variables  are  said  to  be  uncorrelated. 

The  greater  the  value  of  the  coefficient  (r),  the  closer 
is  the  relation  between  the  variables,  whatever  the  sign: 
the  sign  merely  indicates  that  both  variables  vary  to¬ 
gether,  (+  ve.);  or  that  as  one  increases  the  other  decreases, 
(—  ve.). 


*  W.  and  R.,  p.  317. 

**  This  limitation  will  not  be  discussed. 
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39.  If  the  variables  are  the  two  series 

m1  +  x1 ;  m1-\- x 2;  m1-\-x 3;  m1-\- x 4;  &c.,  and 

w2  4-yq ;  w2+y2;  m2  +y3 ;  m2  +  yi ;  &c., 

where  mx  and  m2  are  the  arithmetic  means  of  the  two  series, 
and  xx,  x2,  x3,  and  ylt  y2,  y3,  are  the  differences  between 
the  individual  measures  and  their  means,  the  correlation 
coefficient  is  given  by  the  equation*, 

r  __  5  M ' 

M  Cfx  Gy 

where  S  (xy)  stands  for  the  sum  of  the  products  of  ^  and  y 
for  each  individual,  n  is  the  number  of  individuals,  and  crx, 
au  are  the  s.  ds.  of  the  two  series. 

Example.  The  marks  obtained  by  seven  pupils  in  two  subjects  P 
and  Q  are  given  below.  What  is  the  correlation  coefficient  ? 

The  calculation  is  most  easily  done  as  below: 


Pupil 

Mar 

sub 

P 

c  in 

ect 

Q 

y 

*2 

y 2 

x  X 

+ 

y 

a 

84 

72 

+  31 

+  13 

961 

169 

403 

b 

75 

43 

+  22 

-  l6 

484 

256 

352 

c 

68 

84 

+  15 

+  25 

225 

625 

375 

d 

52 

91 

-  I 

+  32 

I 

IO24 

32 

e 

46 

58 

—  7 

-  I 

49 

I 

7 

f 

3i 

28 

-  22 

—  31 

484 

961 

682 

g 

15 

37 

—  38 

-  22 

1444 

484 

836 

Total 

371 

413 

3648 

3520 

2303 

384 

5  ( xy ) 

Mean 

53 

59 

01  =  521 

0%  =503 

II 

n 

The  means  of  the  marks  obtained  in  the  two  subjects  are  ,  d 
59;  from  these  the  deviations  (x)  and  ( y ),  the  squares  of  these  de¬ 
viations,  and  their  products,  are  obtained; 

So  r  =  5  (xy)/ii-ax  ay 

=  274/(521.503)* 

=  0-54  ,  r  -to 


*  W.  and  R.,  p.  326. 
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which  is  a  measure  of  the  correlation  between  the  marks  of  the  seven 
pupils  in  the  two  subjects. 


Exercises. 

1.  The  loss  of  water  through  transpiration  from  similar  twigs  of 
the  same  length,  but  with  different  numbers  of  leaves,  was  as 
noted  below.  What  is  the  correlation  between  the  weight  of  water 
lost  and  the  number  of  leaves  ? 

Loss  of  water  in  gms. 

4o;  35;  42;  37;  32;  38;  41;  45;  39;  41; 

No.  of  leaves. 

202;  196;  200;  187;  176;  205;  210;  221;  193;  190; 

2.  In  some  motor  ability  tests  the  number  of  taps  made  in  one 
minute  and  the  number  of  match  sticks  inserted  in  small  holes 
in  thirty  seconds  were  as  follows;  nine  subjects.  What  is  the 
correlation  between  the  results  of  the  two  tests  ? 

No.  of  taps 

358;  328;  363;  360;  403;  365;  372;  349;  387; 

No.  of  matches 

18 ;  19;  20;  17;  22 ;  18;  19;  19;  19 ; 

3.  Fourteen  members  of  a  rifle  club  obtained  the  following  scores 
(a)  in  a  certain  test,  and  (b)  on  the  range.  What  is  the  cor¬ 
relation  between  the  scores  ? 

(a)  Test  Scores 

98  96  92  84  81  80  79  75  74  68  64  60  58  55 

(b)  Range  Scores 

26  36  18  30  43  70  74  68  138  44  172  168  45  76 

40.  It  usually  happens  that  the  arithmetical  calculations 
are  simpler  when  deviations  are  measured  from  some  values 
(A  and  B)  other  than  the  means,  vide  para.  26.  Adopting 
the  notation  of  that  para.,  we  obtain  a  product  termS  (grj) 
which  is  greater  than  the  product  term  S  (xy)  by  [n  X  dxd^, 
wher°  n  is  the  number  of  terms  and  d1  and  d2  are  the  dif- 
fm  .ces  between  the  values  of  A  and  B  and  the  means  of 
the  respective  series  (a1,  a2,  a3,  &c.)  and  (blt  b2,  bz,  See.). 

The  equation  for  the  correlation  coefficient, 

r  =  S  ( xy)jn-ax  <r„, 


r  ==  [5  (i  rj)  —  n  d1  d2\ln  -ax-av. 


then  oecomes 
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Exercises. 

1.  Find  the  correlation  between  the  Simple  Reaction  Times  and  the 
Tapping  Test  Scores  of  the  fifteen  subjects  of  whom  data  are 
given  below.  Calculate  differences  from  130  and  360. 

Reaction  Times 

(in  0.001  sec.)  146;  150;  162;  175;  117;  179;  131;  102; 

Tapping  Test  340;  337;  335;  273;  367;  358;  327;  363; 

R.  T.  132;  119;  119;  162;  179;  104;  140; 

T.  T.  360;  403;  365;  348;  372;  349;  377; 

2.  What  is  the  correlation  between  the  following  sets  of  readings*  ? 

(A)  198;  194;  222;  204;  194;  198;  196;  210;  192;  202; 

( B )  220;  198;  206;  234;  202;  199;  200;  233;  204;  214; 

41.  With  a  frequency  distribution  the  details  of  the  work 
are  a  little  more  complicated.  The  method  usually  adopted 
is  shown  in  the  following  example. 

Example.  The  following  hypothetical  table  purports  to  show  the 
amount  of  sugar  in  gms.  found  in  100  plants  after  exposure  to  certain 
conditions  for  various  lengths  of  time.  What  correlation  is  there 
between  sugar  content  and  length  of  exposure  ? 


Taking  5  gms.  and  5  days  as  arbitrary  values  A  and  B,  the 
means  and  s.  ds.  may  be  calculated  as  follows,  adopting  the  method 
of  para.  27. 

For  the  exposures  there  are  6  plants  which  were  exposed  for 
2  days,  so,  for  these  6,  rj  (the  difference  from  B)  is  —  3 ;  similarly 
for  the  11  plants  exposed  for  3  days,  rj  = — 2.  The  calculations 
may  therefore,  be  set  out  thus: 


*  vanHeuven:  British  Journal  of  Psychology,  XVII,  1927, 
P-  133- 
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=  —  3/ioo  =  —  0-03  ; 

Mean  exposure  =  4^97  days 
0*  =  215/100  —  (003)2  =  2-15 
av  —  I-47 

Similarly*  it  will  be  found  that  the  mean  amount  of  sugar  per 
plant  and  the  s.  d.  are  5  03  gms.  and  155,  and  that  ^=4-003. 

The  evaluation  of  the  product  terms  (f  t])  is  most  easily  made  as 
follows. 


2 

Grams  c 
3  |  4 

f  Sugar 

5  |  6  |  7  |  8 

Total 

Exposure 

in 

days 

2 

I 

4-  9 

I 

+  6 

2 

+  3 

I 

O 

I 

—  6 

6 

3 

3 

+  6 

2 

+  4 

4 

+  2 

O 

2 

—  2 

II 

4 

4 

+  2 

8 

+  1 

3 

15 

5 

I 

2 

6 

18 

9 

O 

I 

37 

6 

I 

—  3 

I 

—  2 

O 

—  I 

2 

7 

4'  1 

3 

+  2 

14 

7 

I 

I 

+  2 

6 

+  4 

5 

46 

13 

8 

I 

—  6 

3 

4-6 

4 

Total 

6 

1 1 

20 

25  1 

19 

13 

6 

IOO 

*  The  student  should  verify  the  calculation. 
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Rewrite  the  original  table.  Mark  in  heavy  lines  the  arbitrarily 
chosen  values  from  which  the  means  have  been  calculated,  and  enter, 
in  each  compartment  of  the  table,  the  value  of  the  product  term  (£//) 
for  that  compartment.  For  example,  every  plant  which  has  a  sugar 
content  of  7  gms.  and  has  been  exposed  to  the  special  conditions 
for  2  days,  contributes  (+  2)  ( —  3)  to  the  total,  because  its  sugar 
content  varies  by  (+  2)  gms.  from  the  5  gms.,  i.  e.  |  =  +  2;  while 
the  exposure  of  2  days  is  3  days  less  than  the  5,  i.  e.  r]  =  —  3.  The 
value  ( —  6)  is  in  consequence  entered  in  that  compartment.  The 
other  quantities  are  obtained  similarly. 

The  number  of  times  each  value  of  (£17)  appears  is  collected  and 
the  value  of  5  (ijrj)  is  obtained  thus: 


tjrj 

fre 

+ 

q- 

Total 

I  r]  X  freq. 

I 

15 

15 

15 

2 

12 

3 

9 

18 

3 

2 

I 

I 

3 

4 

8 

8 

32 

6 

12 

2 

IO 

60 

9 

I 

I 

9 

Total 

=  J37 

Whence 

r  =  [S  (|  rj)  ■ —  ndx  d2~]J  n  ■  axav 
137  -)-  100  •  0  03  •  0  03 
100  •  147  •  155 
=  0  6  approximately  . 


Exercises. 

1.  What  is  the  correlation  between  the  number  of  whole  vertebrae 
and  posterior  spines  in  male  embryos  of  Spinex  Niger  ?  (Table  22: 
Bionietrika,  III,  1904,  p.  355.) 


41  |  42 

43 

N 

44 

0.  01 

45 

ver 

46 

tebrE 

47 

ie 

48 

49  1  50  51 

Total 

No.  of  ^3 

poste-  4" 

rior  ^ 
40 

spines  n 

F  39 

I 

3 

4 

I 

IO 

8 

X 

8 

34 

5 

2 

25 

15 

I 

8 

1 1 

7 

1 

2 

18 

45 

62 

18 

Total 

I 

7  |  *9 

48 

42 

20 

7  1  1  1 

1 

145 
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2.  .What  is  the  correlation  between  the  weight  and  height  of  the 
boys  of  whom  details  are  given  in  the  following  table?  (Table; 
Biometrika.  II,  1915,  p.  62.) 

Boys  aged  6%  —  7%  years. 


26 

31 

Weight 
36  41 

in  lb 
46 

s. 

51  1  56 

6l 

Total 

31 

34 

Height  37 
4° 

in  ^ 

43 

inches  46 
49 
52 

I 

I 

1 

8 

6 

4 

1 

6 

37 

93 

22 

1 

2 

IO 

178 

2  20 

21 

2 

2 

35 

294 

133 

2 

6l 

138 

13 

I 

3 

38 

12 

I 

4 

1 

3 

16 

55 

312 

601 

332 

31 

2 

Total  2  |  20  |  158 

434 

464 

215 

53  |  6 

1352 

42.  With  large  samples  the  standard  error  of  the  coeffi¬ 
cient  of  correlation  (04)  is  given  by  the  expression  (1  —  r2)/ j/», 
if  the  distribution  is  normal;  but  for  small  samples  this 
expression  is  not  very  exact.  The  reliability  of  the  coefficient 
of  correlation  obviously  varies  with  the  number  of  indivi¬ 
duals  in  the  sample.  The  graph  of  Appendix  IV,  which  has 
been  prepared  from  table  V  (a),  p.  174  of  Fischer’s  Statis¬ 
tical  Methods  for  Research  Workers  (ist  Edition),  shows 
the  value  of  r,  which  for  different  values  of  n,  may  be  expected 
to  occur  once  per  cent,  as  a  mere  matter  of  chance  by 
random  sampling  from  an  uncorrelated  population.  Values 
greater  than  these  will  be  fairly  significant. 


Exercise. 

Which  of  the  correlation  coefficients  so  far  calculated  as  examples 
or  as  exercises  may  be  regarded  as  significant  ? 

43.  Simpler  formula  for  r.  Prof.  Spearman  showed  that 
the  arithmetical  work  entailed  in  the  calculation  of  a 
correlation  coefficient  was  very  materially  reduced  if  the 
"Ranks”  of  the  individuals,  not  their  scores,  were  considered. 
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The  correlation  between  the  ranks  is  given  by  the  formula 

6  S  (tf2) 

r  =  1~  n  (m2  —  1)  ’ 

where  (d)  is  the  difference  in  the  rank  of  an  individual  in 
the  two  qualities  which  are  to  be  compared. 

Example.  The  ranks  of  the  hypothetical  seven  pupils  in  the 
subjects  P  and  Q,  of  para.  39,  are 


Pupil 

Ran 

P 

k  in 

Q 

d 

d2 

a 

I 

3 

2 

4 

b 

2 

5 

3 

9 

c 

3 

2 

I 

I 

d 

4 

I 

3 

9 

e 

5 

4 

I 

I 

f 

6 

7 

I 

I 

g 

7 

6 

I 

I 

Total  =  26 


Whence  r=  1 - 

7  •  48 

=  0-536  . 

This  result  is  very  close  to  the  value  of  0-542  found  previously. 

N.  B.  When  two  or  more  individuals  have  the  same  rank,  each 
will  be  given  his  mean  rank.  The  total  number  of  ranks  will  equal 
the  number  of  individuals. 

44.  The  value  of  r  obtained  by  using  the  “Rank”  method 
is  not  so  exact  as  that  obtained  from  the  actual  data. 


Exercise. 

Find,  by  the  rank  method,  the  correlations  between  the  records 
at  the  end  of  paras.  39  and  40. 
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APPENDIX  I 


The  probability  (P)  that  a  given  value  of  x-  may  occur  as  a  matter  of  chance  for  various  values  of  n'  shown  in  brackets. 


0-45 


STATISTICAL  METHOD 


43 


APPENDIX  I  —  Continued 


( d ) 


(D  985) 


4 


43 

46 

44 

42 

40 

38 

36 

34 

32 

30 

28 

26 

24 

22 

20 

18 

16 

14 

12 

10 

8 

6 

4 

2 

■cent 
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8  10  1-2  1-4  1-6  1-8  2-0  2-2  2  4  26  26  30  32 


ige  number  of  frequencies  which  lie  between  the  mean  and  a  distance  —  to 

o 

,  when  the  distribution  is  normal:  e.  g.  47'7%  of  the  frequencies  lie  between 
the  mean  and  a  distance  =  2j  to  one  side. 
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APPENDIX  III 


Values  of  —  between  which  and  the  mean  49^5  %  of  the  frequencies  he,  for  various 
on 

values  of  n\  e.  g.  99  %  of  the  means  found,  on  the  results  of  20  observations  would  be  ex¬ 
pected  to  lie  within  the  range  —  =  ±  2'86  of  the  true  mean  as  a  matter  of  chance 

(para.  32). 

If  this  curve  is  used  for  testing  the  significance  of  the  difference  of  means  of  small 
samples  the  value  taken  for  (»)  will  be  one  less  than  the  total  number  of  observations 
in  the  two  samples  (para.  35). 
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NO.  OF  PAIRS  OF  OBSERVATIONS  (n) 
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APPENDIX  V 

LIST  OF  SYMBOLS 

Chapter  I 

(/)  Observed  frequency  (para,  g 
(/e)  Expected  frequency  (para.  9) 

(N)  No.  of  events  (para.  6) 

(n)  No.  of  individuals  (para.  6) 

(j>)  Chance  of  a  success  (para.  6) 

(q)  Chance  of  a  failure  (para.  6) 

(X2)  [vide  formula  (para.  9)] 

Chapter  II  et  seq. 

(alt  a2,  a3>)  and  (blt  b2,  b3>)  actual  quantities  (para.  16) 
(A)  and  (B)  arbitrarily  chosen  measures  from  which  dif¬ 
ferences  in  series  (alt  a2,  a3>)  and  (b1,  b2,  b3,) 
are  calculated  (paras.  17  and  40) 

(d)  difference  between  mean  and  some  other 
value  A  (para.  17) 

(dm)  difference  between  means  of  two  series  (para. 
33) 

(/)  frequency  of  occurrence  (para.  18) 

(m)  arithmetic  mean  (paras.  30  and  39) 

(n)  number  of  terms,  e.  g.  in  para.  16 
(r)  coefficient  of  correlation  (para.  38) 

S  ( )  sum  of  quantities  similar  to  that  in  brackets, 
e.  g.  in  para.  16 

(x1,  x2,  x3>)  differences  between  any  quantities  or  measures 
and  their  mean  (paras.  17  and  30) 

0  standard  deviation  (para.  24) 

Gd  standard  error  of  difference  of  means  (para.  34) 
c rm  standard  error  of  mean  (para.  30) 
as  standard  deviation  of  small  sample  (para.  32) 
(fj,  £2,  i3>)  differences  between  quantities  [alt  a2,  a3>)  and 
arbitrary  value  A  (para.  17) 

(Vi<  ?/2<  r]3>)  differences  between  quantities  (b1,  b2,  b3>)  and 
arbitrary  value  B  (para.  40) 
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FORMULAE 


Binomial  distribution 


Para. 


N-qn ;  N-n-qn~1p;  N 


n  •  ( n — i) 


qn-2p2 f 


If _  f  ) 2 

Goodness  of  fit.  %2  =  Sum  of  all  quantities  iL—JL'L 


/  6 

Arithmetic  Mean  =  S  (a)  jn  . 16 

•S  (*)  =  o . 17 

(d)  —  S  (£)/« . 1 7 

=  S(fi)ln . 18 

(m.  v.)  =  S  (x)/n;  disregarding  signs  ...  23 

(ct)  =  p  (x2)/n . 24 

=  ]/{S  (P)jn}~d2 . 26 

=  isW)ln-d* . 27 

{am)=olfn . 30 

(<*»)=  l/S  (x*)ftn—  1) . 32 

+  < . 34 

(r)  =S  (xy)/n-ax-ay . 39 

=={5  (£ij)  ~  n-d1-d2}/n-ax-ay  ...  40 


=  i  —  {6-S(d2)/n(n2  —  i)} . 43 


LOGARITHM  TABLES 


The  copyright  of  that  portion  of  the  Logarithm  Table  on  p.  50  which  gives  the  logarithms 
of  numbers  from  1000  to  2000  is  the  property  of  Messrs.  Macmillan  &  Comp  any , 
Limited ,  who,  however,  have  authorized  the  use  of  the  form  in  any  reprint  published  for 

educational  purposes 


LOGARITHMS 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1 

2 

3 

4 

5 

6 

7  8  9 

10 

0000 

0043 

0086 

0128 

0170 

4 

9  13  17 

21 

26  30  34  38 

0212 

0253 

0294 

0334 

0374 

4 

8 

12 

16 

20 

24 

28  32  37 

11 

0414 

0453 

0492 

0531 

0569 

4 

8  12 

15 

19 

23  27  31  35 

0607 

0645 

0682 

0719 

0755 

4 

7 

11 

15 

19 

22 

26  30  33 

12 

0792 

0828 

0864 

0899 

0934 

0969 

3 

7  11 

14 

18 

21 

25  28  32 

1004 

1038 

1072 

1106 

3 

7  10  14 

17 

20  24  27  31 

13 

1139 

1173 

1206 

1239 

1271 

3 

7  10  13 

16 

20 

23  26  30 

1303 

1335 

1367 

1399 

1430 

3 

7 

10 

12 

16 

19 

22  25  29 
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ANSWERS 

Para. 

6.  (i)  201 ;  (2)  2-3;  (3)  o-ogS;  0-98;  4-40;  1172;  20-50;  24-60; 
20  50;  &c. 

11.  (1)  On  45%  occasions,  if  frequencies  (o;  1)  and  (4;  5)  are  grouped: 
( n '  =  4;  x2  =  2'7)-  If  110  grouping  011  °'5%  occasions:  ( n ’  =  6; 

Z2  =  W  ' 

(2)  P  =  -40  :  Such  a  divergence  might  be  expected  on  2  out  of 
every  5  repetitions.  (Group  together  3,  4,  5.) 

(3)  Yes:  Such  a  distribution  might  be  expected  on  96  out  of 
every  100  drawings.  (Group  together  5,  6,  7,  8.) 

12.  Such  a  distribution  is  highly  improbable  as  a  matter  of  chance.  The 
physical  conditions  produce  effects  which  can  be  discriminated. 
(£2=28'52;  P  =  ooooooi.) 

(2)  No.  (n'  =  2;  x2  =  323'5)- 
17.  (1)  2-8. 

(4)  a,  13;  b,  o-6;  c,  rg. 

(5)  Group  I,  27-53;  Group  II,  24-94. 

20.  (3,  a,  1)  2-74;  (3,  a,  2)  0-94;  (3,  b)  2-03. 

(4)  Americans,  57-2  years;  Others,  69-9  years. 

(5)  4'5- 

22.  Taking  quartiles  as  80  and  220  approximately,  a  =  104. 

23.  (a)  2-44;  (b)  5-37. 

24.  (1)  3-0  approx.;  (2)  618. 

(2)  a,  0-71;  b,  0-95;  c,  10. 

26.  (1,  a)  3-0  approx.;  (1,  b)  6-i8;  (1,  c)  4-2  N.  B.  (a)  and  (b)  are  the 
exercises  of  para.  24. 

(2)  a,  4-27;  b,  5-00. 

28.  (1)  a  =  0-97,  for  the  eggs;  cr  =  1-18,  for  the  100  throws;  a  =  o-8i, 
for  the  50  throws;  a  =  1-17  for  the  hearts. 

(2)  a,  14-9;  b,  14-85;  (3)  o-8. 

29.  (1)  13-4:  6-7  will  be  greater  and  6-7  will  be  less  than  the  mean 
by  12  units;  (2)  69-1;  (3)  16. 1. 

(4)  a,  ±8-98;  b,  ±8-951  c,  ±  0-54. 

30.  (1)  Yes;  xlom=  137;  162;  35-3;  20. 

(2)  On  2%  occasions  error  of  mean  would  be  1  %  or  more;  (3)  144; 

(4)  64- 

(5)  N°;  x!am  =  I2'5- 
32.  (1)  0-26;  7-41  ±  0-26. 

(2)  a,  as  =  0-75 ;  i-oi;  1-06 


(1)  Yes; 

s.  e.  of  mean 

=  5'2 

(2)  No; 

mean 

=  i-8 

s.  e.  of  mean 

(3)  Yes; 

mean 

=  54 

s.  e.  of  mean 

56  STATISTICAL  METHOD 

34.  (1)  Probably  not;  dm/crd  =  2/0717. 

(2)  Such  difference  is  to  be  expected,  as  a  matter  of  chance,  on 
6%  occasions;  (3)  Yes;  0  096  as  a  minimum. 

(4)  Yes;  =  4-2 

Oi 

dm 

(5)  Probably  not;  —  =  176 

<*d 

(The  numbers  are  comparatively  small:  such  a  divergence 
might  be  expected  as  a  mere  matter  of  chance  on  8%  occasions.) 
36.  Significant.  dm/ad  =  4. 

(2)  No;  ^=27 

ai 

(For  n  —  17  a  value  of  at  least  272  is  necessary.) 

N.B.  The  differences  between  the  individual  figures  of  exercise 
4(b)  and  (c)  give  the  series  4(a).  The  mean  of  this  series  is 
significant,  vide  exercise  2(a),  para.  32.  This  is  of  interest.  Sup¬ 
pose  the  data  of  4  (b)  and  (c)  had  been  obtained  from  the  same 
individuals,  that  they  are,  for  example,  the  data  of  the  per¬ 
centage  increase  of  eggs  laid  by  nine  hens  when  fed  with  two 
kinds  of  food,  the  difference  in  the  feeding  being  the  only  difference. 
We  would  then  be  justified  in  deducing  that  the  increase  was 
statistically  significant,  vide  exercise  2(a),  para.  32.  But  if  dif¬ 
ferent  hens  were  used  the  resulting  difference  in  egg-laying  is  not 
statistically  significant. 

39.  (1)  +079;  (2)  +0  54;  (3)  —058. 

40.  (1)  — 042  (d1  =  h  i ;  d2  =  —-8-4;  ax  =  25-3;  cr„  =  28-0) 

(2)  +  0-39. 

41.  (1)  +  077;  (2)  +  071. 

42.  Para.  39.  Example.  Not  significant;  Exercise  1,  Significant; 
2,  Not  significant;  3,  Not  significant. 

Para.  40  (1)  Not  significant;  (2)  Not  significant. 

Para.  41  Example.  Significant;  Exercise  (1),  Significant;  (2) 
Significant. 

44.  Para.  39  (1)  0  64.  This  is  smaller  than  the  true  value  of  (r); 
(2)  04;  (3)  0  76.  This  value  is  too  big. 

Para.  40  (1)  0-32,  calling  rank  I  the  smallest  R.  T.  and  the 
greatest  number  of  taps.  This  is  smaller  than  the  true  value 
(of  r)  .  It  is  +  ve  because  the  shortest  R.  T.  has  been  ranked 
first. 

(2)  0  6.  This  result  is  very  far  out.  It  is  much  too  big. 
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Mean,  Arithmetic . jq 

Mean,  Arithmetic,  Significance  of . 30 

Mean  variation  or  deviation . 23 
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Probable  error . 29 

Quartiles . 22 

Representative  measures . 13 
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Significance  of  mean . 30,  32 

Spearman . 43 

Standard  deviation . 24 
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